Job submission guide (pip3)
Introduction
This serves as a quickstart to get your job submitted onto the cluster, after transferring your code onto the server.
The researchlong queue will preempt jobs whenever they are insufficient resources. Append the #SBATCH --requeue parameter to your sbatch file to re-submit the job if it gets preempted.
Submitting Python Script to Cluster
You will need to convert the .ipynb file to a .py file first. Either
- Open the Jupyter Notebook interface and under File, Download as, select Python
- Or on your local machine, navigate to the directory where the notebook is located and execute the following command
jupyter nbconvert --to script <NOTEBOOK NAME>.ipynb
Pre-requsities
Before following this guide, ensure you have -
- Logged into the cluster through SSH
- Transferred your project files onto the cluster
- Downloaded a copy of the shell script template below.
Download pip3 job submission shell script template
Getting your account quota information
-
Log into the GPU cluster
-
Execute the command "myinfo" in your terminal/powershell
[IS000G3@origami ~]$ myinfo
================ Account parameters ================
Description | Value
---------------------------------------------
Account name | is000
List of Assigned Partition | tester
List of Assigned QOS | is000qos
---------------------------------------------
... output truncated -
Copy down the values of
- Account name (Line 6)
- List of Assigned Partition (Line 7)
- List of Assigned QOS (Line 8)
Amending the template file
-
Open the downloaded shell script template with your favourite editor
-
Amend line 26 of the file and replace it with the parition value you copied down earlier
# The partition you've been assigned
#SBATCH --partition=tester -
Amend line 27 of the file and replace it with the account value you copied down earlier
# The account you've been assigned
#SBATCH --account=is000 -
Amend line 28 of the file and replace it with the QOS value you copied down earlier
#What is the QOS assigned to you? Check with myinfo command
#SBATCH --qos=is000qos -
Amend line 29 of the file and replace it with your email address. (To enter multiple email addresses, separate with a comma)
# Who should receive the email notifications
#SBATCH --mail-user=exampleuser1@scis.smu.edu.sg,exampleuser2@scis.smu.edu.sg -
Amend line 30 and give your job a title/name
# Give the job a name
#SBATCH --job-name=YourName -
Select the right modules to load.
If you only require Python, there is no need to amend this portion of the code.
-
If you are using ==Tensorflow==, you should amend lines 39 and 40 as shown
# Purge the enviromnent, load the modules we require.
# Refer to https://violet.scis.dev/docs/Advanced%20settings/module for more information
module purge
module load Python/3.11.7
module load cuDNN/8.9.7.29-CUDA-12.3.2
Tensorflow ProjectsPlease refer to the Tensorflow section of the build config guide
-
If you are using ==PyTorch==, you should amend lines 39 and 40 as shown
# Purge the enviromnent, load the modules we require.
# Refer to https://violet.scis.dev/docs/Advanced%20settings/module for more information
module purge
module load Python/3.11.7
module load CUDA/12.4.0
PyTorch ProjectsPlease refer to the PyTorch section of the build config guide
-
-
This command creates a Python virtual environment to reduce package conflict. It should only be ran once and will be ignored if you have an existing virtual environment.
This command creates a virtualenv called
myenvin your home directory# Create a virtual environment
python3.11 -m venv ~/myenv -
The virtual environment must be activated every time, before pip packges are downloaded.
User with multiple projectsIf you have multiple projects, you need to activate the right virtual environment
TipThe
~refers to your home directory.# You will need to have a virtual environment created for this command to work.
source ~/myenv/bin/activate -
Include the libraries/packages you need to install for your project using pip3
TipAfter the first successful run of the script. You may choose to remove these commands to speed up your job submission
# If you require any packages, install it as usual before the srun job submission.
pip3 install numpy
pip3 install scikit -
Replace the template path with the Python script you would like the cluster to execute. An example is provided below —
# execute your job with the srun command
srun --gres=gpu:1 python3 <file path>/myScript.py
Submitting the script
-
Locate the uploaded script and give it executable permissions
[IS000G3@origami ~]$ chmod +x sbatchTemplatePython.sh -
Submit the script to the cluster for processing
[IS000G3@origami ~]$ sbatch sbatchTemplatePython.sh
Submitted batch job 1334 -
The logs/output for this command will appear in the same directory where the sbatch command was executed. The file format follows
<USERNAME>.<JOBID>.out[IS000G3@origami ~]$ ls -lrt
-rw-rw-r--. 1 IS000G3 IS000G3 8 Feb 16 15:05 IS000G3.1334.out -
To show the results of the output, you can open and read the file with the
catcommand[IS000G3@origami ~]$ cat IS000G3.1334.out
...output redacted
You will receive an email notification when your job starts, completes, ends, or fails.
Other useful commands
View the status of your jobs
Use the myqueue command
[IS000G3@origami ~]$ myqueue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1329 tester mlProje- IS000G3 R 3:36:23 1 mustang
View detailed information about a running/pending job
The myjob <jobid> command fetches information on jobs which are currently running or are recently completed in the last 5 minutes.
This command is useful to check on the resources that were allocated to the job
You can use the myqueue command to get the job id of your existing jobs
[IS000G3@origami fyp]$ myjob 1329
JobId=1329 JobName=mlProjectFinalFYP
UserId=IS000G3(1008) GroupId=IS000G3(1012) MCS_label=N/A
Priority=4294901739 Nice=0 Account=is000 QOS=is000qos
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=03:37:52 TimeLimit=06:00:00 TimeMin=N/A
...redacted
View past jobs
The mypastjob <number of days> command will show a history of the jobs that were executed in the past N days. A user may only fetch up to 30 days of past jobs.
[IS000G3@origami fyp]$ mypastjob 2